Incorporating Visual Information into Sound Source Separation

نویسندگان

  • Hiroshi G. Okuno
  • Yukiko Nakagawa
  • Hiroaki Kitano
چکیده

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating a stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The performance is known to be improved by using binaural microphone and microphone array which provide spatial information for separation. However, these methods still have around 20 degree of positional ambiguities. In this paper, we further added visual information to provide more speci c and accurate position information. As a result, separation capability was drastically improved. We argue, from the experiments, in this paper, that integration of vision and auditory sensory inputs improves cognitive tasks such as auditory stream separation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Source Separation and Localization

The problem of mixed signals occurs in many different contexts; one of the most familiar being acoustics. The forward problem in acoustics consists of finding the sound pressure levels at various detectors resulting from sound signals emanating from the active acoustic sources. The inverse problem consists of using the sound recorded by the detectors to separate the signals and recover the orig...

متن کامل

Incorporating Audio Signals into Constructing a Visual Saliency Map

The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human att...

متن کامل

Using Vision to Improve Sound Source Separation

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The ...

متن کامل

Audio Source Separation by Probabilistic Latent Component Analysis

The problem of audio source separation from a monophonic sound mixture having known instrument types but unknown timbres is presented. An improvement to the Probabilistic Latent Component Analysis (PLCA) source separation method is proposed. The technique uses a basis function dictionary to produce a first round PLCA source separation. The PLCA weights are then refined by incorporating note ons...

متن کامل

Audiovisual source separation

Blind source separation (BSS) can be seen as a generalization of denoising a noisy signal when several sensors are available. Each of them records the same physical phenomenon in a different way: such a diversity is then useful to separate the present signals for instance by independent component analysis (ICA) or sparse component analysis (SCA) [1]. The main objective of speech separation/extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999